FEAT: Tool Use + MCP by ValbuenaVC · Pull Request #1811 · microsoft/PyRIT

ValbuenaVC · 2026-05-26T21:03:20Z

Description

PyRIT's existing tool-calling story is fragmented:

OpenAIChatTarget parses tool_calls into function_call pieces and stops — no execution, no loop.
OpenAIResponseTarget hand-rolls a complete agentic loop inside _send_prompt_to_target_async, accepts a custom_functions registry of Python callables, and dispatches one tool call per turn.
MCP servers are not a recognized concept.

This PR introduces a single, target-agnostic tool-use primitive that any PromptTarget subclass can opt into:

New pyrit/tools/ package with a tool_loop decorator wired onto PromptTarget.send_prompt_async, a ToolCallParser protocol (per-target detection), and a ToolBackend ABC with two concrete backends — LocalToolBackend (in-process Python callables) and MCPToolBackend (stdio MCP servers via the official mcp SDK).
New TargetCapabilities.supports_tool_use capability flag plus ToolEventPolicy (EXECUTE / RAISE / RETURN_RAW) and a tool_backend slot on TargetConfiguration. The policy lets red-team callers observe attempted tool use without executing it, or hand the raw response back untouched.
OpenAIResponseTarget migrated onto the decorator. The in-class agentic loop is gone; _send_prompt_to_target_async returns exactly one Message per call and the decorator stitches multi-turn iterations into the response list. Multiple tool calls in a single turn are now dispatched all-at-once sequentially — the protocol-intended behavior.
An InlineToolCallParser that walks text pieces for marker-delimited JSON blocks (configurable regex; defaults to angle-bracket syntax). Non-OpenAI deployments that emit tool calls inline in generated text can opt in by supplying this as _tool_parser.
AzureMLChatTarget and HuggingFaceChatTarget gain optional tool_parser and tool_backend constructor kwargs that opt them into the decorator without subclassing. Supplying a parser flips supports_tool_use=True on the default capabilities so callers don't need a custom_configuration just to enable tool use. The two targets use different wire-format wrappings (AzureMLChatTarget wraps schemas in the OpenAI Chat Completions {"type":"function","function":{...}} envelope; HuggingFaceChatTarget passes bare schemas straight into tokenizer.apply_chat_template).
ChatMessageNormalizer now serializes function_call and function_call_output pieces into the OpenAI Chat Completions wire shape (assistant message with tool_calls; role="tool" message with tool_call_id). This is what makes the chat-completions-shaped targets above able to round-trip tool conversations through @tool_loop without target-side translation code.
custom_functions kwarg on OpenAIResponseTarget is deprecated (removed_in="0.16.0"); internally rewrapped as a LocalToolBackend so the legacy path keeps working through one release cycle.

OpenAIChatTarget is intentionally left as-is. The Responses API is the modern agentic surface for OpenAI; new tool-calling investment there would age poorly. Targets that need tool calling for non-Responses-API endpoints opt into the decorator by supplying a parser and a backend.

Future MCP transports (HTTP/SSE, Docker sandbox), additional sandbox providers, and streaming all plug in behind the existing ToolBackend / MCPServerSpec interfaces with no abstraction changes. The MCPServerSpec union ships with three variants: LocalMCPServerSpec (the only one with a working transport) plus stub declarations of RemoteMCPServerSpec and DockerMCPServerSpec whose connect_async raises NotImplementedError. Future PRs implement an already-declared variant rather than expanding the union.

Tracks deferred work via TODOs marked # TODO(streaming-v2), # TODO(mcp-http-transport), # TODO(mcp-resources), and # TODO(sandbox-provider).

Compatibility

This PR is not breaking for the standard tool-calling path. Compatibility caveats reviewers should know about:

Source-compat — PromptTarget.send_prompt_async is @final. External subclasses that override the public entrypoint (not just _send_prompt_to_target_async) will fail to import. No in-tree target overrides it today.
Deprecation — OpenAIResponseTarget(custom_functions=...). The kwarg now emits DeprecationWarning(removed_in="0.16.0") and is internally rewrapped as a LocalToolBackend. No runtime behavior change in the current release cycle.
Intentional behavior change — multi-call-per-turn dispatch on the Response target. When a model response contains N>1 tool calls, the new loop dispatches all N sequentially in declaration order. The previous hand-rolled loop only dispatched the last call per turn. This is strictly more dispatching, not less, so it cannot regress any working code; it matches the OpenAI protocol's actual intent.
Private API removal on OpenAIResponseTarget. _find_last_pending_tool_call, _execute_call_section, and _make_tool_piece are no longer called from production code. Listed for changelog completeness — these were always private.

Tests and Documentation

New tests/unit/tools/ directory covering the decorator, parsing, LocalToolBackend, MCPClient (real stdio subprocess against a deterministic FastMCP fixture), and MCPToolBackend (multi-server routing, name-collision detection, name_prefix disambiguation, allowed_tools filtering, and concurrent-dispatch serialization).
New tests/unit/prompt_target/common/test_prompt_target_tool_loop.py asserting decorator order-of-execution against a fake target and using patch_central_database to verify per-message insert ordering, per-role labeling (assistant, tool), and per-data-type labeling (function_call, function_call_output) against the actual DB schema.
New tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py covering the migration onto @tool_loop, the deprecation warning on custom_functions, schema injection into request bodies, extra_body_parameters["tools"] precedence, and multi-call-per-turn sequential dispatch.
Existing test_openai_response_target_function_chaining.py sentinel tests pass unchanged: the back-compat property on _custom_functions keeps in-place mutations working.
New tests/unit/tools/test_inline_parser.py covering InlineToolCallParser across marker syntaxes (angle-bracket, pipe-delimited tag pair, square-bracket list payload), mode coverage (TRUNCATE_AT_LAST / TRUNCATE_AT_FIRST / EXTRACT_ALL / STRICT_TRAILING_EMPTY), and edge cases (empty input, malformed JSON, missing name field, multi-piece messages).
Extended tests/unit/prompt_target/target/test_azure_ml_chat_target.py and tests/unit/prompt_target/target/test_huggingface_chat_target.py with coverage for the new tool_parser / tool_backend kwargs: capability flipping, backend installation, request-body shape, no-tools backward compatibility, and (for the AzureML side) response materialization into function_call pieces.
Extended tests/unit/message_normalizer/test_chat_message_normalizer.py with full round-trip coverage of tool-piece serialization (function_call → assistant tool_calls, function_call_output → role=tool with tool_call_id).
New tests/integration/tools/test_red_teaming_with_tools.py running the real RedTeamingAttack against OpenAIResponseTarget with only the HTTP layer mocked. Tools are served by the real echo_mcp_server subprocess; the MCP stdio subprocess, AsyncExitStack lifecycle, canonical envelope round-trip, and RedTeamingAttack execution path all run unmocked.
New tests/integration/tools/test_azure_ml_with_tools_integration.py exercising the full PyRIT @tool_loop stack against AzureMLChatTarget with only the HTTP layer mocked. Asserts the canonical four-piece transcript (user → assistant function_call → tool function_call_output → assistant text) lands in Memory with matching call_id round-tripping.
No notebook/doc additions in this PR — follow-up scenarios PR will exercise the public API.

JupyText: not applicable (no notebook changes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…s for tool calling.

… into PromptTarget.send_prompt_async C4 lands the in-tree wiring for the generic tool-use loop introduced by C2/C3: - TargetCapabilities gains supports_tool_use: bool (default False) and CapabilityName.TOOL_USE for the corresponding enum value, matching the existing supports_X / "supports_X" naming convention used by every other capability. - TargetConfiguration grows tool_event_policy + tool_backend kwargs, both gettable/settable properties. The setter (and constructor) validate that a non-None tool_backend requires supports_tool_use=True; otherwise they raise ValueError immediately. ToolBackend / ToolEventPolicy imports are quoted + behind TYPE_CHECKING to keep pyrit.prompt_target.common from importing pyrit.tools eagerly. - PromptTarget.send_prompt_async picks up @tool_loop (below the existing @Final). The wrapper is a no-op when tool_event_policy is None, so every existing target keeps its current behavior. _tool_parser (property, default None) and _tool_schemas() (default []) are added on the base class as the two collaborators @tool_loop reads. - _permissive_configuration is updated to flip supports_tool_use=True alongside the other supports_X flags so the all-flags-on probe loop in test_discover_target_capabilities still sees every CapabilityName value as supported. tests/unit/tools/conftest.py drops the hand-decorated @tool_loop on _FakeToolTarget.send_prompt_async (which would now violate the base class's @Final) and instead wires policy + backend through TargetConfiguration. _tool_parser becomes a subclass property since the base class now defines one. Tests: - test_tool_event_policy.py adds U7 (capability flag wiring through the wrapper) plus dataclass field defaults and the TargetConfiguration validator. - test_prompt_target_tool_loop.py adds U1 / U2 (DB-end) / U8 / U9 / U11 exercised against a _ProductionShapedTarget that uses the real base-class _get_normalized_conversation_async (memory round-trip via patch_central_database). Plus default-_tool_parser / -_tool_schemas assertions. Validation: 8104 unit tests pass; pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Introduces a generic, target-agnostic tool-use primitive: a new pyrit/tools/ package with a tool_loop decorator (applied to PromptTarget.send_prompt_async), ToolBackend ABC with LocalToolBackend + MCPToolBackend (stdio) implementations, MCP client/server-spec types, ToolCallParser protocol, a new TargetCapabilities.supports_tool_use flag, and TargetConfiguration.tool_event_policy / tool_backend fields. Two new exception classes (ToolCallNotSupported, ToolCallLoopLimitExceeded) carry partial-conversation state. The PR is marked DRAFT; the OpenAI target migrations described in the PR description are not yet present in the diff.

Changes:

New pyrit.tools package (ToolCall, tool_loop, backends, MCP client/specs, parsers)
Base PromptTarget.send_prompt_async made @final @tool_loop, with default no-op _tool_parser / _tool_schemas; capability + configuration fields added
mcp>=1.0,<2 added as a core (non-optional) dependency; new unit tests against a real FastMCP stdio subprocess

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
pyrit/tools/init.py	Public re-exports for the new tools package
pyrit/tools/models.py	`ToolCall`, `ToolEventPolicy`, `tool_loop` decorator core
pyrit/tools/backend.py	`ToolBackend` ABC with default sequential dispatch
pyrit/tools/local_backend.py	In-process callable backend with error envelopes
pyrit/tools/parsers.py	`ToolCallParser` protocol + canonical filter helper
pyrit/tools/mcp_client.py	Stdio `MCPClient`, three `MCPServerSpec` variants (only Local implemented)
pyrit/tools/mcp_backend.py	Multi-server routing, name-prefixing, allow-listing
pyrit/prompt_target/common/prompt_target.py	Apply `@tool_loop` to `send_prompt_async`; default tool hooks
pyrit/prompt_target/common/target_capabilities.py	New `TOOL_USE` capability and `supports_tool_use` flag
pyrit/prompt_target/common/target_configuration.py	New `tool_event_policy` / `tool_backend` fields + validators
pyrit/prompt_target/common/discover_target_capabilities.py	Permissive profile enables `supports_tool_use`
pyrit/exceptions/exception_classes.py, init.py	`ToolCallNotSupported`, `ToolCallLoopLimitExceeded`
pyproject.toml, uv.lock	`mcp>=1.0,<2` added as a core dependency + transitive deps
tests/unit/tools/*	Decorator, policy wiring, local backend, MCP client/backend, real stdio echo server fixture

…nd Response target This commit is intentionally empty. It records a scope decision made in response to PR review feedback. No code changes - the C5 working set was uncommitted and has been reverted. # Why we're dropping C5 Review feedback raised two concerns the original C5 did not address: 1. **Duplication against OpenAIResponseTarget.** The Response target already implements an agentic tool loop (openai_response_target.py lines 590-626), the canonical function_call envelope (lines 666-674), a Python-callable dispatch registry (custom_functions), and an allow-list-ish hook (fail_on_missing_function). C5 layered a parallel implementation on top for the Chat target instead of converging both targets onto one stack. 2. **Chat Completions is on its way out.** OpenAI has publicly framed the Responses API as the long-term replacement for Chat Completions. Investing in tool-call plumbing for a deprecated endpoint ages out fast and obscures the actual value of this PR. The right framing is: this PR is not "tool calling for all targets." It is "pluggable tool-execution backends + a client-side agentic loop for non-Responses-API targets." The Responses API is one transport; this PR is the in-process abstraction that works for every transport. # What survives unchanged C1 (mcp SDK dep), C2 (tools/ scaffold + LocalToolBackend), C3 (MCPClient + MCPToolBackend + Docker stub), and C4 (capability flag + @tool_loop wired on the base class) all remain shipped. The genuinely-novel work - local stdio MCP, pluggable backend ABC, ToolEventPolicy (RAISE / EXECUTE / RETURN_RAW), allowed_tools - is unaffected. # The new design **One agentic loop driver.** The @tool_loop decorator on PromptTarget.send_prompt_async (shipped in C4) is the only loop driver. Every target's _send_prompt_to_target_async returns exactly ONE Message per call. The decorator stitches iterations into the response list. **One tool execution layer.** Every dispatched call flows through ToolBackend.dispatch_async(call) -> envelope. Backends (LocalToolBackend for Python callables, MCPToolBackend for stdio MCP subprocesses, future DockerMCPToolBackend, future CompositeToolBackend) are interchangeable behind a single ABC. **Migrate OpenAIResponseTarget onto the decorator (new C5).** Delete the in-class while loop (lines 590-626). _send_prompt_to_target_async becomes "build body, call API, parse response into one Message, return." Add _tool_parser returning CanonicalEnvelopeParser (extracts only function_call pieces; reasoning, mcp_call, web_search_call, etc. continue to pass through to Memory without dispatch). Translate the configured backend's schemas into the Responses-API tools shape inside _construct_request_body (without clobbering an existing extra_body_parameters["tools"]). Wrap custom_functions as a LocalToolBackend internally with DeprecationWarning(removed_in="0.16.0"), preserving the existing fail_on_missing_function semantics. **Integration tests (new C6).** Rewrite to use the Response target as the sole OpenAI tool-calling path, plus end-to-end scenario tests against the real echo_mcp_server. **OpenAIChatTarget receives no tool-calling support in this PR.** A future PR can pull Chat onto the same abstractions if anyone still wants it, but the recommended OpenAI tool-calling path becomes the Responses API. # Risks * Behavior-parity on the Response target: callers that rely on `len(send_prompt_async(...)) == iterations` rather than scanning piece types will need updating. Existing function-chaining tests act as sentinels. * `custom_functions` deprecation must preserve `fail_on_missing_function` semantics through the LocalToolBackend wrapper. * Response parser must continue to round-trip non-`function_call` piece types (reasoning, mcp_call, etc.) to Memory without dispatching. * `extra_body_parameters["tools"]` takes precedence over backend-derived tools so existing manual configs keep working. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

C6 collapses the Response target in-class agentic loop into the @tool_loop decorator shipped in C4, and routes tool dispatch through LocalToolBackend (wrapping the existing custom_functions registry as a deprecation shim). # What changed - _send_prompt_to_target_async no longer runs a while loop. It now returns exactly one Message per call. The agentic loop is driven by @tool_loop on the base class. - Added _tool_parser returning CanonicalEnvelopeParser from pyrit/tools/parsers.py. The parser extracts only function_call pieces; reasoning, mcp_call, web_search_call, computer_call, local_shell_call, etc. pass through to Memory unchanged because the parser ignores them and the decorator exits cleanly on the empty parse. - Added _tool_schemas() translating the configured backend schemas into the Responses-API tools shape. - _construct_request_body injects tools=... when the backend has schemas. User-supplied extra_body_parameters["tools"] takes precedence. - supports_tool_use=True on _DEFAULT_CONFIGURATION. - custom_functions= now emits DeprecationWarning(removed_in="0.16.0"). Internally wraps into a LocalToolBackend. A LocalToolBackend is always installed (populated or empty) so legacy target._custom_functions[name]=fn mutations keep affecting dispatch via a back-compat property. - Constructor deep-copies the class-level _DEFAULT_CONFIGURATION before mutating it (PromptTarget.get_default_configuration returns the singleton, so otherwise one instances tool_backend would leak across every other instance). # What did NOT change The legacy _find_last_pending_tool_call, _execute_call_section, and _make_tool_piece helpers remain in place. They are no longer called from production code, but existing tests still cover them; cleanup is deferred to the same follow-up PR that removes the custom_functions kwarg after the 0.16.0 deprecation window. # Tests - New tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py with 7 tests covering deprecation warning, dispatch through user-supplied LocalToolBackend, schema injection, extra_body precedence, no-backend behavior, and reasoning-only passthrough. - All 5 existing function-chaining sentinel tests in test_openai_response_target_function_chaining.py pass unchanged: the back-compat _custom_functions property keeps in-place mutations working. 8131 unit tests green; pre-commit clean (ruff format, ruff check, ty). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

C7 adds end-to-end integration coverage of the @tool_loop decorator, MCPToolBackend, and MCPClient stack against the real echo_mcp_server subprocess. Only the OpenAI Responses HTTP layer is mocked; the MCP stdio subprocess, AsyncExitStack lifecycle, canonical envelope round-trip, and RedTeamingAttack execution path all run unmocked. # What ships tests/integration/tools/test_red_teaming_with_tools.py with three tests: 1. test_red_teaming_response_target_with_mcp_echo - end-to-end smoke test. RedTeamingAttack drives OpenAIResponseTarget configured with a MCPToolBackend pointing at echo_mcp_server. The Responses API mock returns one function_call followed by a stop response. Asserts the tool call actually reaches the MCP subprocess and the result lands back in the second API call as a function_call_output. 2. test_red_teaming_persists_canonical_transcript_in_memory - verifies the canonical envelope contract (plan section 13). Reads the conversation back from Memory after attack.execute_async returns and asserts the function_call and function_call_output pieces are present, in order, with matching call_ids. 3. test_red_teaming_dispatches_all_tool_calls_per_turn - regression test for the intentional behavior change from C6. The pre-C6 in-class loop in OpenAIResponseTarget only dispatched the LAST function_call per turn; the @tool_loop decorator now dispatches every call in declaration order. Issues both echo and add in one response and asserts both results land in the next API call. # Test infrastructure - LocalMCPServerSpec uses command=sys.executable + args=(echo_server,). - Mock objective scorer returns a true score so RedTeamingAttack exits cleanly after one turn. - Mock adversarial target returns a single scripted prompt wrapped as list[Message] (PromptTarget.send_prompt_async contract). - Score, ComponentIdentifier, and PromptTarget MagicMock(spec=...) usage matches the existing tests/unit/executor/attack patterns. All three integration tests pass; pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds a parser that walks text MessagePieces for marker-delimited JSON blocks of the form {"name": ..., "arguments": {...}} and emits canonical ToolCall instances. Marker pattern, call_id prefix, and surrounding-text policy (truncate / extract-all / strict) are all constructor-controlled so a single class covers angle-bracket, pipe-delimited tag pair, and other chat-template syntaxes. The parser is the F1 (per plan) piece that lets non-Responses-API targets participate in PyRIT's @tool_loop without a per-vendor parser implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

TargetConfiguration.as_identifier_params() now snapshots the configured tool_event_policy (behavior + max_tool_iterations) and tool_backend (backend class + sorted list of advertised tool names). Two targets that differ only in their tool backend now get distinct identifiers, which downstream consumers rely on to route by target identity. Schema serialization is best-effort: backends with shape-quirky schemas that lack a recoverable 'name' field are silently dropped from the identifier surface. Exact callables and transports are not serialized because they are not deterministic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PyRIT's docs build uses MyST, not reStructuredText, so reST roles like :class:\Foo\ render as literal text in the rendered docs and mismatch the rest of the codebase. Convert all roles in the new pyrit/tools/ module to plain double-backtick code spans, and drop the in-flight commit-numbering references (C1/C2/...) that were carry-overs from the shipping plan and no longer mean anything in source. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…aises Three small cleanups in the new tools test suite: 1. Remove @pytest.mark.asyncio decorators -- the project sets asyncio_mode='auto' in pyproject.toml so the marker is a no-op that creates the appearance of opt-in async test discovery. 2. Narrow pytest.raises((AttributeError, Exception)) to dataclasses.FrozenInstanceError on the two frozen-dataclass guards in test_mcp_client.py. The previous pattern matched every Exception and would have masked unrelated regressions. 3. Drop in-flight C1/C2/.../C10 commit-id strings from test docstrings; they referenced the shipping plan, not the source tree, and read as noise after the commits land. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

rlundeen2 · 2026-05-28T20:35:33Z

Cool idea; can we have a design meeting?

Copilot

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 13 comments.

Comments suppressed due to low confidence (1)

pyrit/prompt_target/openai/openai_response_target.py:23

Deprecation warnings should go through pyrit.common.deprecation.print_deprecation_message rather than calling warnings.warn directly, to keep formatting/stacklevel consistent and filterable across the codebase.

import json
import logging
import warnings
from collections.abc import Awaitable, Callable, MutableSequence
from enum import Enum
from typing import (
    Any,
    Literal,
    Optional,
    cast,
)

from openai.types.shared import ReasoningEffort

from pyrit.common.data_url_converter import convert_local_image_to_data_url_async
from pyrit.exceptions import (
    EmptyResponseException,
    PyritException,
    pyrit_target_retry,
)

+        for _ in range(max_iter):
+            responses_this_turn = await self._send_prompt_to_target_async(
+                normalized_conversation=normalized_conversation,
+            )
+            all_responses.extend(responses_this_turn)
+
+            if parser is None:
+                return all_responses
+
+            last_response = responses_this_turn[-1]


+            results = await backend.dispatch_all_sequential_async(pending_calls)
+            tool_msg = _build_function_call_output_message(
+                reference_piece=last_response.message_pieces[0],
+                outputs=results,
+            )
+            all_responses.append(tool_msg)
+            normalized_conversation = list(normalized_conversation) + [last_response, tool_msg]
+


 from pyrit.models.json_response_config import _JsonResponseConfig
 from pyrit.prompt_target.common.target_capabilities import CapabilityName, TargetCapabilities
 from pyrit.prompt_target.common.target_configuration import TargetConfiguration
+from pyrit.tools import ToolCallParser, tool_loop



+        if custom_functions:
+            warnings.warn(
+                "OpenAIResponseTarget(custom_functions=...) is deprecated and will be "
+                "removed in 0.16.0. Configure tool_backend on TargetConfiguration "
+                "instead (e.g. LocalToolBackend(callables=..., schemas=..., "
+                "fail_on_missing_function=...)).",
+                DeprecationWarning,
+                stacklevel=2,
+            )


+@pytest.mark.asyncio
+async def test_red_teaming_response_target_with_mcp_echo(patch_central_database):


+class TestToolBackendDispatch:
+    """The modern path: pass tool_backend via TargetConfiguration."""
+
+    @pytest.mark.asyncio


+class TestToolSchemasInjection:
+    """_construct_request_body injects backend schemas when present."""
+
+    @pytest.mark.asyncio


+        assert body["tools"][0]["type"] == "function"
+        assert body["tools"][0]["name"] == "get_weather"
+
+    @pytest.mark.asyncio


+        )
+        assert body["tools"] == legacy
+
+    @pytest.mark.asyncio


+    must therefore see an empty parse and exit cleanly.
+    """
+
+    @pytest.mark.asyncio


…all_output pieces ChatMessageNormalizer raised on function_call / function_call_output data types, which meant any target whose wire format runs through it (AzureMLChatTarget, HuggingFaceChatTarget, OpenAIChatTarget) could not round-trip a tool-call conversation through @tool_loop. Adds a per-message tool-message detector that converts function_call pieces to an assistant message with content=null and a ToolCall populated from the canonical envelope, and function_call_output pieces to a role=tool message with tool_call_id set from the envelope's call_id and content set to the output. Matches the OpenAI Chat Completions wire shape. Also fixes ChatMessage.ToolCall whose 'function' field was typed as a bare string; OpenAI ships it as a nested object with name + arguments. ChatMessage.content now permits None for assistant messages that carry only tool_calls (the OpenAI API requires content=null in that shape). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The base default for _tool_schemas() now reads self.configuration.tool_backend.schemas verbatim. Subclasses that need wire-format wrapping (currently only OpenAIResponseTarget, which prepends type=function) override the method and reuse the base via super() to get the raw schemas. Removes a small but real duplication risk for the upcoming AzureMLChatTarget / HuggingFaceChatTarget tool-calling paths, which would otherwise each reimplement the 'read schemas from configured backend' boilerplate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AzureMLChatTarget now participates in PyRIT's @tool_loop when callers supply a ToolCallParser at construction. The parser flips supports_tool_use=True on the default capabilities so callers don't need to construct a custom_configuration just to opt in. A convenience tool_backend kwarg installs the backend onto the configuration in one step. Wire format: _tool_schemas() wraps the backend's schemas in the OpenAI Chat Completions tools shape (with each schema nested under a "function" key). _construct_http_body_async injects the wrapped schemas as a top-level tools field when non-empty. Deployments unwrap that envelope before passing to tokenizer.apply_chat_template; see plan section 12.9 for the contract. Response handling: _complete_chat_async now returns the parsed JSON body (was: string output). The new _materialize_response walks the response dict and emits one text MessagePiece for the output field plus one function_call MessagePiece per envelope in the tool_calls field; CanonicalEnvelopeParser then finds those pieces in the loop's next iteration. The no-tools path is unchanged: requests without tool_parser produce byte-identical request bodies, verified by test_request_body_omits_tools_key_when_no_backend. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Same shape as the AzureMLChatTarget F2 change: callers supply a ToolCallParser at construction; the parser flips supports_tool_use=True on the default capabilities so no custom_configuration is required to opt in. A convenience tool_backend kwarg installs the backend onto the configuration in one step. Wire format differs from AzureML because HuggingFace runs the model in-process via the transformers library: * _tool_schemas() returns the bare backend schemas (no OpenAI envelope) because tokenizer.apply_chat_template expects bare function schemas, not the Chat Completions wrapper. * _apply_chat_template forwards tools= into apply_chat_template when schemas are present; the model's tool-trained chat template renders the model-family-specific tools block (Qwen wraps in <tools>...</tools>, Llama uses a system-message preamble, etc.). * _build_chat_messages now walks every piece in each message and converts function_call / function_call_output envelopes to the chat-template tool message shape (assistant + tool_calls list, role=tool + tool_call_id) so the model sees the canonical in-context tool conversation. The no-tools path is unchanged: without tool_parser, no tools key is passed to apply_chat_template and no tool message translation runs. The user-supplied tool_parser walks the response text for inline tool-call markers; InlineToolCallParser is the typical choice for ChatML-style angle-bracket markers, but the user can supply any ToolCallParser implementation (different marker regex, different mode). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds tests/integration/tools/test_azure_ml_with_tools_integration.py exercising the full PyRIT @tool_loop stack against AzureMLChatTarget with only the HTTP layer mocked. The mocked responses match the §12.9.2 canonical envelope shape: first response carries a tool_calls field that the loop dispatches via LocalToolBackend; second response is the final assistant text. Asserts the canonical four-piece transcript shape persists in Memory: [user text, assistant function_call, tool function_call_output, assistant text], with the call_id round-tripping between the assistant function_call piece and the tool function_call_output piece, and the tool output reflecting the actual dispatched callable's return value. Also covers the no-tools backward-compatibility path: a target constructed without tool_parser produces a request body that has no tools key, proving the F2 changes do not regress existing AzureML deployments that don't carry the patched scoring script. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The previous cleanup commit (31ed2fb) removed the pyrit/tools/ package and tests/unit/tools/ directory, but several tool-calling changes from PR microsoft#1811 (MCP) remained mixed in: - pyrit/exceptions: ToolCallNotSupported and ToolCallLoopLimitExceeded - pyrit/prompt_target/common/: @tool_loop decoration on send_prompt_async, supports_tool_use capability, tool_event_policy and tool_backend slots on TargetConfiguration - pyrit/prompt_target/openai/openai_response_target.py: migration onto @tool_loop + LocalToolBackend (the in-class agentic loop was removed in favor of the decorator) - tests/integration/tools/ and tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py - pyproject.toml + uv.lock: mcp Python SDK dependency All of the above are reverted to origin/main. The adversarial benchmark refactor (this PR's actual scope) is unaffected; 128 targeted unit tests across openai_response_target, function_chaining, and scenario/benchmark still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Victor Valbuena and others added 4 commits May 26, 2026 13:59

Add mcp Python SDK dependency

7a664c5

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' into MCP

d68d75c

Add tools/ package with tool_loop decorator and CallableToolBackend

c7d65d3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Addition of pyrit/tools package and unit tests. Introduces base model…

39752ac

…s for tool calling.

ValbuenaVC changed the title ~~[DRAFT] FEAT: Tool Use via MCP~~ [DRAFT] FEAT: Tool Use + MCP May 26, 2026

Victor Valbuena and others added 3 commits May 26, 2026 15:56

Addition of MCP components including the MCP client and tool backend.

e8ab8ff

Merge branch 'main' into MCP

f566914

ValbuenaVC requested review from Copilot, hannahwestra25 and romanlutz and removed request for hannahwestra25 May 27, 2026 20:41

Copilot started reviewing on behalf of ValbuenaVC May 27, 2026 20:41 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

ValbuenaVC and others added 8 commits May 28, 2026 12:10

Merge branch 'main' into MCP

b5fa035

ValbuenaVC requested a review from Copilot May 28, 2026 20:15

Copilot started reviewing on behalf of ValbuenaVC May 28, 2026 20:15 View session

ValbuenaVC changed the title ~~[DRAFT] FEAT: Tool Use + MCP~~ FEAT: Tool Use + MCP May 28, 2026

ValbuenaVC marked this pull request as ready for review May 28, 2026 20:16

Copilot AI reviewed May 28, 2026

View reviewed changes

Merge branch 'main' into MCP

9b80e14

ValbuenaVC requested a review from Copilot May 29, 2026 17:37

Copilot started reviewing on behalf of ValbuenaVC May 29, 2026 17:37 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

Victor Valbuena and others added 5 commits May 29, 2026 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Tool Use + MCP#1811

FEAT: Tool Use + MCP#1811
ValbuenaVC wants to merge 21 commits into
microsoft:mainfrom
ValbuenaVC:MCP

ValbuenaVC commented May 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

rlundeen2 commented May 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@pytest.mark.asyncio
		async def test_red_teaming_response_target_with_mcp_echo(patch_central_database):

Conversation

ValbuenaVC commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Compatibility

Tests and Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

rlundeen2 commented May 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ValbuenaVC commented May 26, 2026 •

edited

Loading